speedup the inference of vit (gelu, rmsnorm and fa3 for H-series) and chunked prefill for multimodal #766

SangChengC · 2025-03-12T08:56:47Z

No description provided.

Copilot

Pull Request Overview

This PR accelerates ViT inference by integrating optimized Triton kernels for gelu and rms norm operations, adds flash attention support for Hopper GPUs, and implements chunked prefill for multimodal scenarios. Key changes include:

Enhancements to VisualModelRpcServer and model.encode to support per-image maximum patch counts via max_num_list.
Updates in router, multimodal parameters, and memory cache logic to propagate and utilize a new max_num parameter.
Integration of Triton kernels for gelu and rms norm, along with adjustments in backend and preprocessing for multimodal inputs.

Reviewed Changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
lightllm/server/visualserver/model_infer/model_rpc.py	Propagates max_num_list to model.encode in forward for multimodal inference.
lightllm/server/router/model_infer/model_rpc.py	Passes is_multimodal flag to chunked prefill backend.
lightllm/server/router/model_infer/mode_backend/chunked_prefill/impl.py	Updates chunked prefill to accept is_multimodal parameter.
lightllm/server/multimodal_params.py	Introduces max_num parameter and corresponding logic.
lightllm/server/embed_cache/*.py	Adds new API for max_num and updates memory cache record structure.
lightllm/server/api_http.py	Adjusts multimodal image processing and token counting.
lightllm/models/vit/*	Modifies encode and layer inference functions to support new gelu/rms norm kernels.
lightllm/models/internvl/*	Updates image token length calculations and preprocessing to include max_num.

Comments suppressed due to low confidence (2)

lightllm/server/embed_cache/utils.py:16

[nitpick] The parameter name 'img_str' is ambiguous because it may represent either a file path or a file-like stream. Consider renaming it to something that clearly indicates the expected input type, like 'image_input'.

def image2base64(img_str: str):

lightllm/server/api_http.py:251

Passing 'response.raw' (a stream) to image2base64 assumes that the function can handle file-like objects. Verify and document the accepted input types for image2base64 or adjust its implementation accordingly.

data = image2base64(response.raw)

Copilot · 2025-03-28T09:13:16Z

lightllm/server/visualserver/model_infer/model_rpc.py

+        if self.tp_rank_id == 0:
+            for i in range(len(images_uuids)):
+                uid = images_uuids[i] 
+                max_num_list.append(self.cache_client.root.get_max_num(uid))


Currently, max_num_list is populated only when self.tp_rank_id == 0, which may result in an empty list for other ranks. Consider ensuring a consistent max_num_list is provided to self.model.encode for all cases.

Suggested change

max_num_list.append(self.cache_client.root.get_max_num(uid))

max_num_list[i] = self.cache_client.root.get_max_num(uid)

hiworldwzj · 2025-04-01T03:39:07Z

lightllm/server/multimodal_params.py

@@ -21,6 +22,7 @@ def __init__(self, **kwargs):
        self.image_h = 0

        self._preload_data = None
+        self.extra_params = {"image_patch_max_num": kwargs.get("max_num", None)}


通用性。

SangChengC force-pushed the fix_chuned_prefill branch 2 times, most recently from f5e5bbd to 9d475cb Compare March 13, 2025 05:24

fix chunked prefill

ca1fef0

SangChengC force-pushed the fix_chuned_prefill branch from 9d475cb to ca1fef0 Compare March 13, 2025 05:45

shihaobai and others added 9 commits March 13, 2025 17:23

fix tp

0997283

fix

b237506

add gelt and rms_norm

3b60919

[add]fix tokens and add fa3

6c244bb

[fix] fix_max_num

8820087

[fix] __init

35ba1ef

[fix] __init2

6a6d165

[fix] bug2

1d67b1e

add openai-api-image

6318acd

shihaobai changed the title ~~fix chunked prefill~~ speedup the inference of vit (gelu, rmsnorm and fa3 for H-series) and chunked prefill for multimodal Mar 28, 2025

shihaobai requested a review from Copilot March 28, 2025 09:12

Copilot AI reviewed Mar 28, 2025

View reviewed changes

Merge remote-tracking branch 'origin/main' into fix_chuned_prefill

724c28f

SangChengC closed this Mar 28, 2025

SangChengC reopened this Mar 28, 2025

SangChengC force-pushed the fix_chuned_prefill branch from 14b22e4 to 0c19cf6 Compare March 28, 2025 10:01

SangChengC closed this Mar 28, 2025

SangChengC reopened this Mar 28, 2025

fix_merge_conflict

bb5b9f7

SangChengC force-pushed the fix_chuned_prefill branch from 0c19cf6 to bb5b9f7 Compare March 28, 2025 10:22

fix

835ac10

SangChengC force-pushed the fix_chuned_prefill branch 3 times, most recently from 3f76f9e to 52c6b99 Compare March 31, 2025 09:55

ModelTC deleted a comment from Copilot AI Mar 31, 2025

[fix] vit0331

339d98e

SangChengC force-pushed the fix_chuned_prefill branch from 52c6b99 to 339d98e Compare March 31, 2025 10:00

hiworldwzj reviewed Apr 1, 2025

View reviewed changes

SangChengC force-pushed the fix_chuned_prefill branch 4 times, most recently from edb87de to 9e3ae23 Compare April 1, 2025 10:27

0401fix

01b3f68

SangChengC force-pushed the fix_chuned_prefill branch from 9e3ae23 to 01b3f68 Compare April 2, 2025 03:26

[fix]0402

57372e5

SangChengC force-pushed the fix_chuned_prefill branch 3 times, most recently from cb7fd6d to 1f25c14 Compare April 2, 2025 04:58

[fix]0402-2

ea0fe0d

SangChengC force-pushed the fix_chuned_prefill branch from 1f25c14 to ea0fe0d Compare April 2, 2025 05:00

fix

81ca6c2

hiworldwzj merged commit 750957f into main Apr 2, 2025
1 check passed

shihaobai deleted the fix_chuned_prefill branch May 29, 2025 05:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

speedup the inference of vit (gelu, rmsnorm and fa3 for H-series) and chunked prefill for multimodal #766

speedup the inference of vit (gelu, rmsnorm and fa3 for H-series) and chunked prefill for multimodal #766

Uh oh!

SangChengC commented Mar 12, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 28, 2025

Uh oh!

hiworldwzj Apr 1, 2025

Uh oh!

Uh oh!

Uh oh!

	max_num_list.append(self.cache_client.root.get_max_num(uid))
	max_num_list[i] = self.cache_client.root.get_max_num(uid)

speedup the inference of vit (gelu, rmsnorm and fa3 for H-series) and chunked prefill for multimodal #766

speedup the inference of vit (gelu, rmsnorm and fa3 for H-series) and chunked prefill for multimodal #766

Uh oh!

Conversation

SangChengC commented Mar 12, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Mar 28, 2025

Choose a reason for hiding this comment

Uh oh!

hiworldwzj Apr 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!